Emergency entity relationship extraction for water diversion project based on pre-trained model and multi-featured graph convolutional network

Using information technology to extract emergency decision-making knowledge from emergency plan documents is an essential means to enhance the efficiency and capacity of emergency management. To address the problems of numerous terminologies and complex relationships faced by emergency knowledge extraction of water diversion project, a multi-feature graph convolutional network (PTM-MFGCN) based on pre-trained model is proposed. Initially, through the utilization of random masking of domain-specific terminologies during pre-training, the model’s comprehension of the meaning and application of such terminologies within specific fields is enhanced, thereby augmenting the network’s proficiency in extracting professional terminologies. Furthermore, by introducing a multi-feature adjacency matrix to capture a broader range of neighboring node information, thereby enhancing the network’s ability to handle complex relationships. Lastly, we utilize the PTM-MFGCN to achieve the extraction of emergency entity relationships in water diversion project, thus constructing a knowledge graph for water diversion emergency management. The experimental results demonstrate that PTM-MFGCN exhibits improvements of 2.84% in accuracy, 4.87% in recall, and 5.18% in F1 score, compared to the baseline model. Relevant studies can effectively enhance the efficiency and capability of emergency management, mitigating the impact of unforeseen events on engineering safety.


Introduction
The emergency plan for water diversion project plays a significant role in guiding the efficient and orderly implementation of emergency management efforts [1]. With the advancement of water conservancy informatization, the field of emergency management for water diversion project has accumulated a vast array of document-based emergency plans. These plans exhibit of BERT and CNN models gives birth to an innovative methodology for extracting knowledge related to fire emergencies, referred to as the BERT-CNN approach. In the literature [42], a method based on BERT-RNN for identifying emergency entities for earthquake disaster is proposed. In the literature [43], an integration of BERT, BILSTM, and CRF is employed for the purpose of nested named entity recognition in the field of geological disasters. Although the aforementioned studies have achieved promising results in their respective fields, the focus during the pre-training stage primarily revolves around traditional pre-training tasks [44]. In addition, regarding the research on GCN, it is common to only consider the relationships of single-feature neighboring nodes, thereby overlooking the broader information from neighboring nodes [45,46]. Based on the above discussion, this study proposes a multi-feature GCN method (PTM-MFGCN) based on a pre-trained model to solve the problems of numerous technical terms and complex relationships faced by emergency knowledge extraction of water diversion project. The main contributions are as follows: • By employing the technique of random masking on specialized terminology, the model acquires a deeper comprehension of the meaning and application of such terms within specific domains. Consequently, this elevates the network's proficiency in extracting and recognizing professional terminology; • By incorporating a multi-feature adjacency matrix, the network can effectively capture a broader range of information from neighboring nodes, thus enhancing its ability to process complex relationships; • Extraction of emergency entity relationships for water diversion project based on PTM-MFGCN and construction of emergency knowledge graph for water diversion project; The remaining sections of this paper are organized as follows: Section 2, describes the progress of work related to PTM-MFGCN. Sections 3 and 4 detail the construction process of PTM-MFGCN and the process of entity relationship extraction. Section 5 presents the experimental protocol, results and discussion. Finally, section 6 concludes the paper and indicates future research directions.

Pre-trained language models
Pretrained language models aim to acquire extensive linguistic knowledge through self-supervised learning using text corpora [47]. Early iterations of pre-trained language models predominantly relied on n-gram and rule-based approaches. In the literature [48], the Global Vectors model was proposed, which accomplishes pre-training by incorporating both word co-occurrence statistics and global semantic information. In the literature [49], the Word Vector model was proposed, which achieves pre-training by representing words as continuous vectors through self-supervised learning.
With the advancement of deep learning, the literature [50] introduced the Transformer model, which employs self-attention mechanisms for sequence modeling. The Transformer model effectively captures long-range dependencies in text through its self-attention mechanism, leading to significant performance improvements in machine translation tasks. Subsequently, the literature [35] introduced the BERT model, building upon the foundation of Transformer. The BERT model employs a strategy of pre-training and fine-tuning, conducting extensive pre-training on large-scale unsupervised data and subsequently fine-tuning on downstream tasks. The groundbreaking innovation of the BERT model has yielded exceptional results across multiple natural language processing tasks, sparking widespread scholarly attention towards pre-trained language models. GNN GNN, as a deep learning model, is designed for handling graph data [51]. Early research on GNN primarily focused on the extraction of graph features and the learning of representations. The paper [52] introduced the graph recursive neural network (GRN), which employs iterative propagation of node features to learn node representations.
However, the GRN primarily relies on the information from local neighboring nodes and fails to effectively capture global graph structural information. The literature [53] introduced GCN, which extends convolutional operations to the field of graphs, updating node representations by aggregating information from neighboring nodes. The introduction of GCN bridged the gap in graph domain's representation learning, laying the foundation for subsequent research. The literature [54] introduced the graph attention network (GAT) building upon GCN, incorporating attention mechanisms to model the weights between nodes. GAT possesses the capability to dynamically allocate attention weights among different nodes, enabling more precise aggregation of neighboring node information.The introduction of GAT further enhances the modeling capacity for complex graph structures.

Pre-training mechanism
The emergency plan for water diversion project involves a wide range of specialized terminology. By employing pre-trained language models, it is possible to transfer the linguistic knowledge acquired during the pre-training phase to the task of extracting emergency knowledge. During the pre-training phase PTM-MFGCN enables the model to better understand the meaning and usage of terminology in a particular domain by randomly masking the terminology.
Terminology masking allows the model to learn the contextual features of terms by predicting the masked content by randomly masking different technical terms. Specifically, by applying three levels of masking operations (character-level, entity-level, and phrase-level) to the input text. In the character-level masking, 15% of the domain-specific terms are randomly selected for masking. The selected terms have an 80% probability of being replaced with "[MASK]", a 10% probability of being replaced with other terms, and a 10% probability of remaining unchanged. The character-level masking strategy enables the model to learn basic word representations, but it struggles to capture advanced semantic knowledge. Entity-level masking begins by analyzing the specialized term entities in the sentence, and subsequently applies random masking to these entities to uncover semantic knowledge that encompasses the terms. Phrase-level masking initially employs a sentence segmentation tool to extract phrases from the text, followed by the random selection of a subset of these phrases for masking operations, thus facilitating the modeling of phrase-level semantic knowledge. Fig 1 illustrates the process of randomly masking professional terms during the pre-training phase.

MFGCN construction
GCN is an extension of CNN in non-Euclidean spaces [55]. GCN employs convolutional operations to encode local information and learns more global information through the message passing of multiple GCN layers. Given a sentence consisting of n words, it is common to construct an adjacency matrix, represented as A�R n*n , using syntactic dependency trees [56], to represent the sentence graph. The value of A ij indicates whether the ird node is connected to the jrd node. Specifically, if the ird node is connected to the jrd node, then A ij = 1; otherwise, A ij = 0. All nodes in the graph are associated with a label, and GCN utilizes the known label information of nodes to predict the labels of unknown nodes for classification. For the Lrd node in the ird layer, the hidden representation is denoted as h l i , and its calculation is expressed by the following formula: where σ denotes the activation function, W represents the weight matrix, b represents the bias term. This study introduces four types of linguistic features, which are used to initialize four adjacency matrices based on these features. The multi-feature adjacency matrix is composed of the part-of-speech combination vector A psc , the syntactic dependency vector A sdt , the tree-based distance vector A tbd , and the position vector A rpd , where the definitions of A psc , A sdt , A tbd , A rpd are illustrated in Fig 2. Based on the four adjacency matrices, the local information is encoded through convolutional operations, resulting in the vector representations of hidden nodes H psc , H sdt , H tbd , and H rpd . The pooling function and the join operation are applied to all hidden layer nodes to obtain the model output, as defined in the following equation.
Among them, τ pool denotes the pooling function, and � serves as the connectivity operator. symbol serves as an encoded representation that signifies the entirety of the input sequence, enabling the model to proficiently undertake classification and other downstream tasks. Using character-level, entity-level, and text-level encoding masking mechanisms, conceal specialized terminologies within the sentence. The inputs to the pre-trained model are defined as follows.

PLOS ONE
Among them, [MASK] denotes the masked term characters, E [MASK] denotes the masked term entities, p [MASK] represents the masked term context, and X denotes the word vectors extracted using the pre-training model.
Graph convolution computation module: The pre-training module produces a spatial vector representation of the sentence. Based on part-of-speech combination vectors, syntactic dependency vectors, tree distance vectors, and positional vectors, we construct multi-feature adjacency matrices A psc , A std , A tbd , and A rpd . Taking A psc as an example, feature extraction is performed using graph convolution and is defined as follows.
Finally, the extracted features are utilized as inputs for the pooling function and the classification function to carry out entity relation extraction. For model training, the cross-entropy function is employed as the loss function, defined as follows.

Emergency entity relationship extraction based on PTM-MFGCN
The study focuses on the comprehensive emergency plans at the provincial and municipal levels, as well as 17 specific emergency plans released between 2014 and 2021 for the South-to-North Water Diversion Project in China. The South-to-North Water Diversion Project in China is a large-scale inter-basin water transfer project implemented to alleviate water scarcity issues in the northern region of China. Since its comprehensive implementation, the project has conveyed a volume of 58.6 billion cubic meters of water, effectively alleviating water scarcity issues in the northern region of China. Based on the PTM-MFGCN, the emergency entity relationship extraction for water diversion projects primarily consists of three stages: Text preprocessing, entity relationship extraction, and storage of knowledge graph triplets. The basic disposal process of the method is given in Fig 4 and the procedure is as follows.
1. Text preprocessing: Initially, employing character-level, entity-level, and phrase-level masking strategies, random masking is applied to professional terminology within the original emergency plan text, followed by the addition of a classification symbol [CLS] at the beginning of the text sequence.
2. Entity relation extraction: By employing pre-training tasks masked with random terminology, contextual features and semantic information of specialized terminology are extracted. The output of the pre-training model corresponds to the spatial vector representation of the sentence, denoted as X = [x 1 , x 1 ,. . .x n ]. Based on X, a multi-feature adjacency matrix is constructed, encompassing A psc , A std , A tbd , and A rpd . The spatial vector X obtained during the pre-training phase is combined with the multi-feature adjacency matrix as the input for the GCN model to facilitate feature extraction. Finally, the features extracted by the GCN model are used as inputs to the classification model for entity extraction and relation extraction.
3. Knowledge graph triplet storage: After named entity recognition and relation extraction, the emergency data of the water diversion project has been transformed from unstructured textual data into structured knowledge triplets. The Neo4j graph database is employed for knowledge storage, where the acquired structured knowledge is stored in a semantic network, primarily composed of nodes and edges.

Experiment
Entity relation extraction serves as the foundation for constructing a knowledge graph. In this section, we validate the effectiveness of PTM-MFGCN through the task of entity relation extraction in the context of emergency plans for water diversion project. Firstly, an explanation will be provided regarding the datasets, baseline models, and evaluation indexes used in the experiment. Subsequently, the experiment will be divided into three groups to achieve distinct research objectives.
1. An experiment will be conducted to extract entity relationships using the PTM-MFGCN model on the emergency plan data of the water diversion project. The experimental results will be compared and analyzed with those of previous state-of-the-art models to assess the effectiveness of the PTM-MFGCN model; 2. Conducting a sensitivity analysis experiment for hyperparameters to verify the model's sensitivity to the settings of hyperparameters and evaluate its performance; 3. The impact of term mask-based pre-training tasks with multi-feature graph convolution on the performance of PTM-MFGCN is analysed through ablation experiments.
4. Based on actual emergency scenarios, an evaluation of the emergency knowledge graph constructed in this study is conducted through case retrieval. This assessment determines the efficacy of the method proposed in this paper in accurately acquiring pertinent information to underpin emergency decision-making.

Data set and baseline model
This study utilizes the comprehensive emergency plans at the provincial and municipal levels, as well as 17 specific emergency plans released from 2014 to 2021, pertaining to the South-to-North Water Diversion Project in China, as the experimental data. The emergency plans contain a total of 790,000 entity relations triad of four phases: forecast and warning, graded response, emergency disposal and post-security. Table 1 provides some examples of emergency plans. The emergency plan data is divided into training and testing sets, with a partition ratio of 7:3. The baseline models selected for this study are widely used named entity recognition models and relation extraction models. The baseline models include BILSTM [21], FastText [23], TextCNN [27], BERT-CNN [41], and BERT-BILSTM-CRF [43,44]. Table 2 presents the parameter configurations for each model.

Evaluation indexes
For model evaluation, this study employs accuracy, recall, and F1 score as the evaluation indexes. Accuracy refers to the proportion of correctly predicted samples by a model out of the total number of samples. It measures the overall predictive accuracy of the model. Recall is the proportion of samples correctly predicted as positive by the model out of the total number of true positive samples. It measures the model's ability to identify positive instances. The F1 score is the harmonic mean of accuracy and recall, used to evaluate the overall performance of a model. The higher the values of accuracy, recall, and F1 score, the better the performance of the model. Accuracy (P), recall (R), and F1 score are defined as follows: Among them, T P denotes the number of correctly predicted entities, F P denotes the number of predicted entities that are not actual entities, and F N denotes the number of entities that were not predicted. Furthermore, this study integrates the actual emergency scenarios of engineering, employing knowledge usability as a qualitative assessment index to appraise the emergency knowledge graph formulated in this paper. This assessment serves to ascertain the capability of the method proposed in this study to precisely acquire valuable information to bolster emergency decision-making. Table 3 present the experimental results of different models. In terms of entity extraction, FastText performed the poorest with precision, recall, and F1 values of 78.03, 38.60, and 42.52, respectively. FastText utilizes character-level n-gram features to represent text, which limits its ability to comprehend complex semantics. The BERT-BILSTM-CRF performed better with an accuracy recall and F1 of 93.49, 94.87 and 94.09 respectively. Based on pre-training and a bidirectional language model, BERT-BILSTM-CRF possesses the capability to delve deep into the intricate relationships between vocabulary, syntax, and semantics, thereby providing a comprehensive contextual representation. The PTM-MFGCN model proposed in this paper achieves an accuracy, recall, and F1 score of 94.65, 97.28, and 95.94, respectively. Compared to BERT-BILSTM-CRF, the PTM-MFGCN model achieves an improvement of 1.24% in accuracy, 2.54% in recall, and 1.97% in F1 score. In the field of emergency knowledge extraction for water diversion projects, there is a plethora of specialized terminology within the emergency plans for water diversion projects. Due to the inability of traditional pre- training models to perform targeted pre-training, BERT-BILSTM-CRF exhibits weaker extraction capabilities for emergency knowledge in water diversion projects compared to PTM-MFGCN. In terms of relation extraction, BERT-BILSTM-CRF achieves accuracy, recall, and F1 scores of 81.56%, 78.67%, and 78.06% respectively, while PTM-MFGCN achieves accuracy, recall, and F1 scores of 85.18%, 84.33%, and 84.60% respectively. Compared to BERT-BILSTM-CRF, PTM-MFGCN achieves an improvement of 4.43% in accuracy, 7.19% in recall, and 8.38% in F1 score. The emergency plan for water diversion projects exhibits characteristics such as complex relationship. The proposed graph convolution-based relation extraction model in this paper demonstrates superior performance in handling complex graph-structured data, thereby enhancing its capability in the domain. In general, the PTM-MFGCN proposed in this paper surpasses widely employed state-of-the-art models in terms of entity relation extraction results. This indicates the capability of PTM-MFGCN to effectively accomplish the task of emergency entity relation extraction for water diversion project.

Hyperparametric sensitivity analysis
To investigate the impact of hyperparameters on the performance of the PTM-MFGCN model, this study conducts a detailed analysis and comparison of several crucial hyperparameters, including the number of network layers, learning rate, and number of iterative rounds. 140, 150}. For the sake of experimental fairness, apart from the hyperparameters under current investigation, the remaining hyperparameters are set the same as in Section 5.1. The experimental results are illustrated in Fig 6.  Fig 6A to 6C gives the relationship between the number of network layers, the number of iterative rounds and the evaluation indexes. It can be seen that when the number of network layers of the model is 6 or 7, the overall performance of the model is good. When the number of network layers is 1 to 3, the model is underfitted and the overall performance of the model is poor. When the number of network layers of the model is 13 to 15, the model appears to be overfitted and the overall performance of the model decreases. The experimental results indicate that the number of iterative rounds and the number of network layers are sensitive parameters. When the number of iterative rounds and the number of network layers is too small, it may result in underfitting issues, wherein the model fails to adequately express the rich semantic information of all entity relationships. Conversely, when the number of iterative rounds and the number of network layers is too large, overfitting phenomena may arise, leading to a decline in performance. Fig 6D to 6F gives the relationship between learning rate, number of iterative rounds and evaluation indexes. It can be observed that the model performs well when the number of iterative rounds from 10 to 60, and the learning rate is within the range of 1e-5 to 0.001. The model exhibits good performance when the number of iterative rounds from 70 to 120, and the learning rate is within the range of 5e-6 to 0.005. As the number of iterative rounds or the learning rate increases, the evaluation indexes of the model first tend to increase, then stabilise, and then start to slowly decrease. The experimental results similarly demonstrate that the learning rate and the number of iterative rounds are sensitive parameters, and their excessively small or large values may give rise to the problems of underfitting or overfitting in the model.

Ablation experiments
In order to assess the impact of pre-training tasks utilizing term masking and multi-feature graph convolution on model performance. In this study, two variant models, namely PTM-MFGCN-CUT and PTM-MFGCN-UNI, were designed based on the PTM-MFGCN model. PTM-MFGCN-CUT denotes the removal of term masking pre-training from the PTM-MFGCN model, while PTM-MFGCN-UNI signifies the removal of multi-feature graph convolution functionality. The PTM-MFGCN model and its variant models demonstrate the average precision, recall, and F1 score in entity relation extraction task, as presented in Table 4.
From Table 3 17.97% in recall, and 34.27% in F1 score. Due to the removal of term masking pre-training in the PTM-MFGCN-CUT model, its ability to extract specialized terminology in emergency plans has diminished, resulting in a decline in model performance. Compared to PTM-MFGCN-UNI, PTM-MFGCN achieves a improvement of 6.10% in precision, 10.56% in recall, and 10.72% in F1 score. PTM-MFGCN-UNI, which adopts a single-feature strategy, exhibits a weakened ability to extract complex relationships, resulting in an overall decline in model performance.

Case retrieval experiments for emergency knowledge graphs
Taking " Emergency process of flooding in South-to-North Water Diversion Project" as an example, the results of searching the emergency knowledge graph are shown in Table 5 and The emergency process of flooding in the South-to-North Water Diversion Project is divided into forecast and early warning, graded response, emergency disposal and post-security. Among them, forecasting and warning contains knowledge related to the collection and release of warning information. The graded response contains I~IV level of emergency response standards. Emergency disposal includes disposal measures and disposal requirements. Post-security includes team security, communication security and power security. According to the information presented in Table 5 and Fig 7, the flood emergency process is basically included in the list of returned results, and the usability of the knowledge is high. The current search results combined with the actual situation of the South-to-North Water Diversion Project can basically meet the demand for accurate push of emergency knowledge. For the small number of correct content not covered in the recommendation list, the main reasons for recommendation failure are as follows. ① Due to the possibility of annotation errors during manual supervision of data labeling, the entity recognition algorithm may encounter difficulties in accurately identifying certain entities. ② The insufficiency of training data for the entity recognition model has led to an inability to accurately identify pertinent entities.

Conclusion
1. In this study, a pre-training task based on terminology masking is proposed to address the problem of numerous terminologies in water diversion project emergency plan texts. Compared to BERT-BILSTM-CRF, PTM-MFGCN exhibits improvements in accuracy, recall,  and F1 score, with an increase of 1.24%, 2.54%, and 1.97% respectively. This enables it to effectively perform the task of extracting emergency entities in water diversion projects.
2. To address the issue of complex relationships in emergency plan texts, a relationship extraction method based on multi-feature graph convolution has been devised. The experimental results demonstrate that PTM-MFGCN achieves an accuracy, recall, and F1 score of 85.18%, 84.33%, and 84.60%, respectively. The model exhibits superior overall performance, showcasing its robust capability in handling complex graph-structured data.